Comparative Cross-Platform Performance Results from a Parallelizing SML Compiler
نویسندگان
چکیده
We have developed a compiler for Standard ML which allows instantiation of a xed set of higher order functions with equivalent parallel algorithmic skeletons written in C and MPI. The compiler is intended to be both portable and semi-automatic. Here we discuss the performance of the code generated by the compiler, for ve exemplar programs on six MIMD parallel machines. Results suggest acceptable performance and consistent cross-platform behaviour. 1 Compiler overview We have developed a parallelising compiler 10] for a pure functional subset of Standard ML in which the sole source of parallelism is indicated by speciic higher order functions (HOFs). These are translated into parallel algorithmic skeletonss3] implemented in C linked with an MPI library 8]. The main objective of this work is to build a compiler for which all parallelism is implicit. Our use of algorithmic skeletons allows us to partially achieve this goal. On the one hand the selection and parallel implementation of the higher-order functions is automatic. On the other hand the programmer is constrained to use the particular set of HOFs provided by our compiler. We have focused on two HOFs; map and fold: fun map ] = ] | map f (h::t) = f h::map f t fun foldr b f ] = b | foldr b f (h::t) = f h (foldr b f t) map applies a function f to each element of a list and fold, shown here in left-to-right form, applies a function f \between" elements of a list. These functions ?
منابع مشابه
Empirical Parallel Performance Prediction from Semantics-Based Profiling
The PMLS parallelizing compiler for Standard ML is based upon the automatic instantiation of algorithmic skeletons at sites of higher order function (HOF) use. Rather than mechanically replacing HOFs with skeletons, which in general leads to poor parallel performance, PMLS seeks to predict run-time parallel behaviour to optimise skeleton use. Static extraction of analytic cost models from progr...
متن کاملA Portable Parallelizing Compiler with Loop Partitioning
Multithreaded programming support seems to be the most obvious approach to helping programmers take advantage of operating system parallelism. In this paper, we present the design and implementation of a portable FORTRAN parallelizing compiler (PFPC) with loop partitioning on our AcerAltos-10000 multiprocessor system, running an OSF/1 multithreaded OS. In order to port the PFPC to the system en...
متن کامل2 Cml Overview
Both the implementation and the user’s view of eXene rely heavily on the concurrency model provided by CML . CML is based on the sequential language SML[MTH90, MT91] and inherits the following good features of SML: functions as first-class values, strong static typing, polymorphism, datatypes and pattern matching, lexical scoping, exception handling and a state-of-the-art module facility. The s...
متن کاملA Compilation Approach for Fortran 90D/ HPF Compilers
This paper describes a compilation approach for a Fortran 90D/HPF compiler, a source-to-source parallel compiler for distributed memory systems. Diierent from Fortran 77 parallelizing compilers, a Fortran90D/HPF compiler does not parallelize sequential constructs. Only par-allelism expressed by Fortran 90D/HPF parallel constructs is exploited. The methodoly of parallelizing Fortran programs suc...
متن کاملCoarse-Grain Task Parallel Processing Using the OpenMP Backend of the OSCAR Multigrain Parallelizing Compiler
This paper describes automatic coarse grain parallel processing on a shared memory multiprocessor system using a newly developed OpenMP backend of OSCAR multigrain parallelizing compiler for from single chip multiprocessor to a high performance multiprocessor and a heterogeneous supercomputer cluster. OSCAR multigrain parallelizing compiler exploits coarse grain task parallelism and near ne gra...
متن کامل